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Abstract 

Mediation  analysis  aims  to  uncover  causal  pathways  along  which 
changes  are  transmitted  from  stimulus  to  response.  Recent  advances 
in  causal  inference  have  given  rise  to  a  general  and  easy-to-use  esti¬ 
mator  for  assessing  the  extent  to  which  the  effect  of  one  variable  on 
another  is  mediated  by  a  third,  thus  setting  a  causally-sound  standard 
for  mediation  analysis  of  empirical  data.  This  estimator,  called  Media¬ 
tion  Formula,  is  applicable  to  nonlinear  models  with  both  discrete  and 
continuous  variables,  and  permits  the  evaluation  of  path-specific  effects 
with  minimal  assumptions  regarding  the  data-generating  process.  We 
demonstrate  the  use  of  the  Mediation  Formula  in  simple  examples  and 
illustrate  why  parametric  methods  of  analysis  yield  distorted  results, 
even  when  parameters  are  known  precisely.  We  stress  the  importance 
of  distinguishing  between  the  necessary  and  sufficient  interpretations 
of  “mediated-effect”  and  show  how  to  estimate  the  two  components  in 
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1  Mediation:  Direct  and  Indirect  Effects 

1.1  Direct  versus  Total  Effects 

The  target  of  many  empirical  studies  in  the  social,  behavioral,  and  health 
sciences  is  the  causal  effect,  here  denoted  P(y\do(x)),  which  measures  the 
total  effect  of  a  manipulated  variable  (or  a  set  of  variables)  A"  on  a  response 
variable  Y.  In  many  cases,  this  quantity  does  not  adequately  represent  the 
target  of  investigation  and  attention  is  focused  instead  on  the  direct  effect  of 
A"  on  Y .  The  term  “direct  effect”  is  meant  to  quantify  an  effect  that  is  not 
mediated  by  other  variables  in  the  model  or,  more  accurately,  the  sensitivity 
of  Y  to  changes  in  A"  while  all  other  factors  in  the  analysis  are  held  fixed. 
Naturally,  holding  those  factors  fixed  would  sever  all  causal  paths  from  X  to 
Y  with  the  exception  of  the  direct  link  A"  — >  Y ,  which  is  not  intercepted  by 
any  intermediaries. 

A  classical  example  of  the  ubiquity  of  direct  effects  involves  legal  disputes 
over  race  or  sex  discrimination  in  hiring.  Here,  neither  the  effect  of  sex  or  race 
on  applicants’  qualification  nor  the  effect  of  qualification  on  hiring  are  targets 
of  litigation.  Rather,  defendants  must  prove  that  sex  and  race  do  not  directly 
influence  hiring  decisions,  whatever  indirect  effects  they  might  have  on  hiring 
by  way  of  applicant  qualification. 

From  a  policy  making  viewpoint,  an  investigator  may  be  interested  in  de¬ 
composing  effects  to  quantify  the  extent  to  which  weakening  or  strengthening 
specific  causal  pathways  would  impact  the  overall  effect  of  X  on  Y .  For  exam¬ 
ple,  the  extent  to  which  minimizing  racial  disparity  in  education  would  reduce 
racial  disparity  in  earning.  Or,  taking  a  health-related  example,  the  extent  to 
which  efforts  to  eliminate  side-effect  of  a  given  treatment  are  likely  to  weaken 
or  enhance  the  efficacy  of  that  treatment.  More  often,  however,  the  decom¬ 
position  of  effects  into  their  direct  and  indirect  components  carries  theoretical 
scientific  importance,  for  it  tells  us  “how  nature  works”  and,  therefore,  enables 
us  to  predict  behavior  under  a  rich  variety  of  conditions  and  interventions. 

Structural  equation  models  provide  a  natural  language  for  analyzing  path- 
specific  effects  and,  indeed,  considerable  literature  on  direct,  indirect  and  total 
effects  has  been  authored  by  SEM  researchers  (Alwin  and  Hauser  (1975),  Graff 


2 


and  Schmidt  (1982),  Sobel  (1987),  Bollcn  (1989)),  for  both  recursive  and  non¬ 
recursive  models.  This  analysis  usually  involves  sums  of  powers  of  coefficient 
matrices,  where  each  matrix  represents  the  path  coefficients  associated  with 
the  structural  equations. 

Yet  despite  its  ubiquity,  the  analysis  of  mediation  has  long  been  a  thorny 
issue  in  the  empirical  sciences  (Judd  and  Kenny,  1981;  Baron  and  Kenny, 
1986;  Muller  et  ah,  2005;  Shrout  and  Bolger,  2002;  MacKinnon  et  ah,  2007a) 
primarily  because  structural  equation  modeling  in  those  sciences  were  deeply 
entrenched  in  linear  analysis,  where  the  distinction  between  causal  parameters 
and  their  regressional  interpretations  can  easily  be  conflated  (as  in  Holland, 
1995;  Sobel,  2008).  The  difficulties  were  further  amplified  in  nonlinear  mod¬ 
els,  where  sums  and  products  are  no  longer  applicable.  As  demands  grew  to 
tackle  problems  involving  binary  and  categorical  variables,  researchers  could 
no  longer  define  direct  and  indirect  effects  in  terms  of  structural  or  regressional 
coefficients,  and  all  attempts  to  extend  the  linear  paradigms  of  effect  decom¬ 
position  to  nonlinear  systems  produced  distorted  results  (MacKinnon  et  ah, 
2007b).  These  difficulties  have  accentuated  the  need  to  redefine  and  derive 
causal  effects  from  first  principles,  uncommitted  to  distributional  assumptions 
or  a  particular  parametric  form  of  the  equations.  The  structural  methodology 
presented  in  this  paper  adheres  to  this  philosophy  and  it  has  produced  indeed 
a  principled  solution  to  the  mediation  problem,  based  on  the  counterfactual 
reading  of  structural  equations  (Balke  and  Pearl,  1994a, b;  Pearl,  2009a,  Chap¬ 
ter  7).  The  following  subsections  summarize  the  method  and  its  solution, 
while  Section  2  introduces  the  Mediation  Formula,  exemplifies  its  behavior, 
and  demonstrates  its  usage  in  simple  examples,  including  linear,  quasi-linear, 
logistic,  probit  and  nonparametric  models.  Finally,  Section  4  compares  the 
Mediation  Formula  to  other  methods  proposed  for  effect  decomposition  and 
explains  the  difficulties  that  those  methods  have  encountered  in  defining  and 
assessing  mediated  effects. 

1.2  Controlled  Direct  Effects 

A  major  impediment  to  progress  in  mediation  analysis  has  been  the  lack  of 
notational  facility  for  expressing  the  key  notion  of  “holding  the  mediating 
variables  fixed,:  in  the  definition  of  direct  effect.  Clearly,  this  notion  must  be 
interpreted  as  (hypothetically)  setting  the  intermediate  variables  to  constants 
by  physical  intervention,  not  by  analytical  means  such  as  selection,  regression 
conditioning,  stratification  matching  or  adjustment.  For  example,  consider 
the  simple  mediation  models  of  Fig.  1(a),  where  the  error  terms  (not  shown 
explicitly)  are  assumed  to  be  mutually  independent.  To  measure  the  direct 
effect  of  X  on  Y  it  is  sufficient  to  measure  their  association  conditioned  on  the 
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Figure  1:  (a)  A  generic  model  depicting  mediation  through  Z  with  no  con- 
founders,  and  (b)  with  two  confonnders,  W\  and  li-f  . 


mediator  Z.  In  Fig.  1(b),  however,  where  the  error  terms  are  dependent,  it 
will  not  be  sufficient  to  measure  the  association  between  X  and  Y  for  a  given 
level  of  Z  because,  by  conditioning  on  the  mediator  Z,  we  create  spurious 
associations  between  A"  and  Y  through  W 2 ,  even  when  there  is  no  direct  effect 
of  X  on  Y  (Pearl,  1998;  Cole  and  Hernan,  2002). 

Using  the  do{x )  notation,  enables  us  to  correctly  express  the  notion  of 
“holding  Z  fixed”  and  obtain  a  simple  definition  of  the  controlled  direct  effect 
of  the  transition  from  X  =  x  to  X  =  x'  (Pearl,  2009a,  p.  128): 

CDE  =  E(Y\do(x'),  do(z ))  -  E(Y\do{x),  do{z ))  (1) 

or,  equivalently,  using  counter  factual  notation: 

CDE  =  E(YX,Z )  -  E(YXZ) 


where  Z  is  the  set  of  all  mediating  variables.1  Readers  can  easily  verify  that, 
in  linear  systems,  the  controlled  direct  effect  reduces  to  the  path  coefficient 
of  the  link  X  — >  Y  regardless  of  whether  confonnders  are  present  (as  in  Fig. 
1(b))  and  regardless  of  whether  the  error  terms  are  correlated  or  not. 

This  separates  the  task  of  definition  from  that  of  identification,  and  thus 
circumvents  many  pitfalls  in  this  area  of  research  (Pearl,  2009b).  The  identifi¬ 
cation  of  CDE  would  depend,  of  course,  on  whether  confounders  are  present 

1Readers  not  familiar  with  this  notation  can  consult  (Pearl,  2009a, b,  2010).  Conceptually, 
P(y\do(x))  stands  for  the  probability  of  Y  =  y  in  a  randomized  experiment  where  treatment 
level  is  set  to  X  =  x,  while  Yx(u)  stands  for  the  value  that  Y  would  attain  in  unit  u,  had 
X  been  x.  Formally,  P(y\do(x))  and  Yx(u)  are  defined,  respectively,  as  the  probability  and 
value  of  variable  Y  in  a  modified  structural  model,  in  which  the  equation  for  X  is  replaced 
by  a  constant  X  =  x).  This  model  encodes  a  system  of  natural  laws  that  accommodates 
counterfactuals  with  non-manipulable  antecedants  (e.g.,  race  and  sex),  and  is  immune  to 
the  conceptual  difficulties  elaborated  in  Vanderweele  and  Hernan  (2011). 
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and  whether  they  can  be  neutralized  by  adjustment,  but  these  do  not  al¬ 
ter  its  definition.  Nor  should  trepidation  about  infeasibility  of  the  action 
do(gender  =  male )  enter  the  definitional  phase  of  the  study.  Definitions  apply 
to  symbolic  models,  not  to  human  biology.2 

Graphical  identification  conditions  for  expressions  of  the  type  E(Y\do(x), 
do(zi),  do(z2), . . . ,  do(zk))  in  the  presence  of  unmeasured  confounders  were  de¬ 
rived  by  Pearl  and  Robins  (1995)  and  invoke  sequential  application  of  the 
back-door  condition  (Pearl,  2009a,  pp.  252-254),  which  is  somewhat  more 
powerful  than  G-computation  (Robins,  1986).  Tian  and  Shpitser  (2010)  have 
further  derived  a  necessary  and  sufficient  condition  for  this  task,  and  thus 
resolved  the  identification  problem  for  controlled  direct  effects  (Eq.  1). 

1.3  Natural  Direct  Effects 

In  linear  systems,  the  direct  effect  is  fully  specified  by  the  path  coefficient 
attached  to  the  link  from  X  to  Y;  therefore,  the  direct  effect  is  independent 
of  the  values  at  which  we  hold  Z.  In  nonlinear  systems,  those  values  would, 
in  general,  modify  the  effect  of  X  on  Y  and  thus  should  be  chosen  carefully  to 
represent  the  target  policy  under  analysis.  For  example,  it  is  not  uncommon 
to  find  employers  who  prefer  males  for  the  high-paying  jobs  (i.e.,  high  z) 
and  females  for  low-paying  jobs  (low  z).  Focusing  on  one  of  these  values  of 
Z,  or  averaging  over  all  values  would  not  capture  the  underlying  pattern  of 
discrimination. 

When  the  direct  effect  is  sensitive  to  the  levels  at  which  we  hold  Z,  it  is 
often  more  meaningful  to  define  the  direct  effect  relative  to  some  “natural” 
base-line  level  that  may  vary  from  individual  to  individual,  and  represents  the 
level  of  Z  just  before  the  change  in  X.  Conceptually,  we  can  define  the  natural 
direct  effect  DEX^(Y)  as  the  expected  change  in  Y  induced  by  changing  X 
from  x  to  x'  while  keeping  all  mediating  factors  constant  at  whatever  value 
they  would  have  obtained  under  do(x).  This  hypothetical  change,  which  Robins 
and  Greenland  (1992)  conceived  and  called  “pure”  and  Pearl  (2001)  formalized 
and  analyzed  under  the  rubric  “natural,”  mirrors  what  lawmakers  instruct  us 
to  consider  in  race  or  sex  discrimination  cases:  “The  central  question  in  any 
employment-discrimination  case  is  whether  the  employer  would  have  taken 
the  same  action  had  the  employee  been  of  a  different  race  (age,  sex,  religion, 
national  origin  etc.)  and  everything  else  had  been  the  same.”  (In  Carson 
versus  Bethlehem  Steel  Corp .,  70  FEP  Cases  921,  7th  Cir.  (1996)). 

2In  reality,  it  is  the  employer’s  perception  of  applicant’s  gender  and  his/her  assessment 
of  gender-job  compatibility  that  render  gender  a  “cause”  of  hiring  -  manipulation  of  gender 
is  not  needed. 
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Thus,  whereas  the  controlled  direct  effect  measures  the  effect  of  X  on  Y 
while  holding  Z  fixed,  at  a  uniform  level  (z)  for  all  units,3  the  natural  direct 
effect  allows  z  to  vary  from  individual  to  individual  and  be  fixed  at  the  level 
that  each  individual  held  naturally,  just  before  the  change  in  X. 

Pearl  (2001)  gave  the  following  definition  for  the  “natural  direct  effect”: 

DEX,X,(Y)  =  E(Yx,,Zx)  -  E{YX).  (2) 

Here,  Yx^zx  represents  the  value  that  Y  would  attain  under  the  operation  of 
setting  X  to  x'  and,  simultaneously,  setting  Z  to  whatever  value  it  would  have 
obtained  under  the  setting  X  =  x.  For  example,  if  one  were  to  estimate  that 
the  natural  direct  effect  of  gender  on  hiring  equals  20%  of  the  total  effect, 
one  can  infer  that  20%  of  the  current  gender-related  disparity  in  hiring  can  be 
eliminated  by  making  hiring  decision  gender-blind,  while  keeping  applicants 
qualifications  at  their  current  values  (which  may  be  gender  dependent). 

We  see  from  (2)  that  DEx^xi{ Y),  the  natural  direct  effect  of  the  transition 
from  x  to  x\  involves  probabilities  of  nested  counterfactuals  and  cannot  be 
written  in  terms  of  the  do(x)  operator.  Therefore,  the  natural  direct  effect 
cannot  in  general  be  identified  or  estimated,  even  with  the  help  of  ideal,  con¬ 
trolled  experiments  -  a  point  emphasized  in  Robins  and  Greenland  (1992). 4 
However,  aided  by  the  formal  definition  of  Eq.  (2)  and  the  notational  power  of 
nested  counterfactuals,  Pearl  (2001)  was  nevertheless  able  to  derive  conditions 
under  which  the  natural  direct  effect  can  be  expressed  in  terms  of  the  do(x) 
operator,  implying  identifiability  from  controlled  experiments.  For  example, 
if  a  set  W  exists  that  deconfounds  Y  and  Z,  the  natural  direct  effect  can  be 
reduced  to5 

DEX,X,(Y)  =  y^[FJ(y  \do(x',  z ),  w)  —  E(Y\do(x,  z),w)]P(z\do(x),w)P(w). 

Z,W 

(3) 

3In  the  hiring  discrimination  example,  this  would  amount,  for  example,  to  testing  gender 
bias  while  marking  all  application  forms  with  the  same  level  of  schooling  and  other  skill¬ 
defining  attributes. 

4The  reason  being  that  we  cannot  rerun  history  and  test  individuals’  response  both 
before  and  after  the  intervention.  Robins  (2003)  elaborates  on  the  differences  between 
the  assumptions  made  in  (Pearl,  2001)  and  the  weaker  assumptions  made  in  Robins  and 
Greenland  (1992),  which  prevented  the  latters  from  identifying  natural  effects  even  in  the 
simple  case  of  no-confounding.  (Fig.  1(a)). 

5The  key  condition  for  this  reduction  is  the  existence  of  a  set  W  of  covariates,  nonde¬ 
scendants  of  X,  satisfying  YxzJLZx>\W,  which  simply  states  that  W  blocks  all  back-door 
paths  from  Z  to  Y  except  the  one  through  X  (see  Pearl  (2009a,  p.  101)).  More  refined 
counterfactual  conditions  for  identification  are  derived  in  Petersen  et  al.  (2006),  Irnai  et  al. 
(2010c),  and  Robins  and  Richardson  (2011).  However,  none  matches  the  clarity  of  the  back¬ 
door  condition  above,  and  all  are  equivalent  in  the  graphical  language  of  non-parametric 
structural  equations  (Shpitser  and  VanderWeele,  2011). 
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The  intuition  is  simple;  the  kP-specific  natural  direct  effect  is  the  weighted 
average  of  the  controlled  direct  effect,  using  the  causal  effect  P(z\do(x),w)  as 
a  weighing  function.6 

In  particular,  it  can  be  shown  (Pearl,  2001)  that  the  expression  in  Eq.  (3) 
is  identifiable  in  Markovian  models  (i.e. ,  acyclic  graphs  with  no  unobserved 
confounders)  since  each  do-expression  can  be  reduced  to  a  “do-free”  expression 
by  covariate  adjustments  (Pearl,  2009a)  and  then  estimated  by  regression.  For 
example,  for  the  model  in  Fig.  1(b),  DEX:X>(Y )  reduces  to: 

DEXtX'(Y )  =  EE  P{w2)[E{Y \x',  z,  w2))—E(Y \x,  z,  w2))]  ^  P(z\x,  wi,  w2)P(w\). 

Z  W>2  Wi 

(4) 

while  for  the  confounding-free  model  of  Fig.  1(a)  we  have: 

DEXiX.(Y )  =  J2[E(Y\x',z)  -  E(Y\x,  z)]P{z\x).  (5) 

Z 

Both  (4)  and  (5)  can  be  estimated  by  a  regression. 

When  Z  consists  of  multiple  interconnected  mediators,  affected  by  an  in¬ 
tricate  network  of  observed  and  unobserved  confounders,  the  adjustment  illus¬ 
trated  in  Eq.  (4)  must  be  handled  with  some  care.  Theorems  1  and  2  of  Pearl 
(2001)  can  then  be  used  to  reduce  DEXjX>(Y)  to  a  do-expression  similar  to  (3) 

(see  footnote  5).  Once  reduced,  the  machinery  of  do-calculus  (Pearl,  1995)  can 
be  invoked,  and  the  methods  of  Pearl  and  Robins  (1995),  Tian  and  Shpitser 
(2010),  and  Shpitser  and  VanderWeele  (2011)  can  select  the  proper  set  of  co- 
variates  and  reduce  the  natural  direct  effect  (3)  to  an  expression  estimable  by 
regression,  whenever  such  reduction  is  feasible.  For  example,  if  in  Fig.  1(b)  ITj 
is  unobserved  and  another  observed  covariate,  IT 3 ,  mediates  the  path  X  — >  Z, 
the  last  term  of  Eq.  (5),  P(z\do(x),  w2),  would  then  be  identifiable  through  the 
front-door  formula  (Pearl,  1995,  2009a),  thus  rendering  DEx>xt( Y)  estimable 
by  regression.  This  demonstrates  that  neither  “ignorability”  (Rosenbaum  and 
Rubin,  1983)  nor  “sequential  ignorability”  (Irnai  et  ah,  2010a, c)  is  necessary 
for  securing  the  identification  of  direct  effects;  transparent  graph-based  criteria 
are  sufficient  for  determining  when  and  how  confounding  can  be  controlled. 

See  (Pearl,  2009a,  pp.  341-344)  for  graphical  interpretation  of  “ignorability” 
assumptions. 

6  Throughout  this  paper  we  will  use  summation  signs  with  the  understanding  that  inte¬ 
grals  should  be  used  whenever  the  summed  variables  are  continuous. 
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1.4  Indirect  effects 

Remarkably,  the  definition  of  the  natural  direct  effect  (2)  can  be  turned  around 
and  provide  an  operational  definition  for  the  indirect  effect  -  a  concept  shrouded 
in  mystery  and  controversy,  because  it  is  impossible,  by  controlling  any  of  the 
variables  in  the  model,  to  disable  the  direct  link  from  X  to  Y  so  as  to  let  X 
influence  Y  solely  via  indirect  paths. 

The  natural  indirect  effect ,  IE ,  of  the  transition  from  x  to  x'  is  defined 
as  the  expected  change  in  Y  affected  by  holding  X  constant,  at  X  =  x ,  and 
changing  Z  to  whatever  value  it  would  have  attained  had  X  been  set  to  X  —  x' . 
Formally,  this  reads  (Pearl,  2001): 

IEX,X,(Y)  ^  E[(YX'Zx,)  -  E(YX)\,  (6) 

which  is  almost  identical  to  the  direct  effect  (Eq.  (2))  save  for  exchanging  x 
and  x'  in  the  first  term. 

Invoking  the  same  conditions  that  led  to  the  experimental  identification  of 
the  direct  effect,  Eq.  (3),  we  obtain  a  parallel  formula  for  the  indirect  effect: 

IEX>X,(Y)  =  ^2E(Y\do(x,z),w)[P(z\do(x'),w)  —  P(z\do(x),w)\.  (7) 

Z,W 

The  intuition  here  is  somewhat  different,  and  represents  a  nonlinear  version 
of  the  “product-of-coefficients”  strategy  in  linear  models  (MacKinnon,  2008); 
the  E(Y'\do(x,  z),  w)  term  encodes  the  effect  of  Z  on  Y  for  fixed  X  =  x  and 
W  =  w,  while  the  [P(z\do(x'),w)  —  P (z\do(x) ,  w)]  encodes  the  effect  of  A" 
on  Z.  We  see  that  what  was  a  simple  product-of-coefficients  in  linear  models 
turns  into  a  convolution  type  operation,  involving  all  values  of  Z. 

In  non-experimental  studies,  the  do-operator  need  be  reduced  to  regression 
type  expression  using  covariate  adjustment  or  instrumental  variable  methods. 

For  example,  for  the  model  in  Fig.  1(b),  Eq.  (7)  reads: 

ie,  ay)  =  EEpwiB<yix'2'm2))Eip(2  \x',  Wi,  w2)-P(z\x ,  Wi,  w2)P(wi)}. 

Z  W2  Wi 

(8) 

while  for  the  confounding- free  model  of  Fig.  1(a)  we  have 

IEx,xfY)  =  J2E(Y\X’Z)[P(z\x')  -  P(z\x)]  (9) 


which,  like  Eq.  (5)  can  be  estimated  by  a  two-step  regression. 


1.5  Effect  decomposition 

Not  surprisingly,  owed  to  the  nonlinear  nature  of  the  model,  the  relationship 
between  the  total,  direct  and  indirect  effects  is  non-additive.  Indeed,  it  can 
be  shown  that,  in  general,  the  total  effect  TE  of  a  transition  is  equal  to  the 
difference  between  the  direct  effect  of  that  transition  and  the  indirect  effect  of 
the  reverse  transition.  Formally, 

TEXtX,{Y )  =  E{YX,  -  Yx)  =  DEX,X,(Y)  -  IEX,,X(Y).  (10) 

In  linear  systems,  where  reversal  of  transitions  amounts  to  negating  the  signs 
of  their  effects,  we  have  the  standard  additive  formula 

TEXiX.(Y )  =  DEX,X,(Y )  +  IEX,X,(Y).  (11) 

Since  each  term  above  is  based  on  an  independent  operational  definition,  this 
equality  constitutes  a  formal  justification  for  the  additive  formula  used  rou¬ 
tinely  in  linear  systems. 7 

Note  that,  although  it  cannot  in  general  be  expressed  in  do-notation,  the 
indirect  effect  has  clear  policy-making  implications.  For  example:  in  the  hiring 
discrimination  context,  a  policy  maker  may  be  interested  in  predicting  the 
gender  mix  in  the  work  force  if  gender  bias  is  eliminated  and  all  applicants 
are  treated  equally — say,  the  same  way  that  males  are  currently  treated.  This 
quantity  will  be  given  by  the  indirect  effect  of  gender  on  hiring,  mediated  by 
factors  such  as  education  and  aptitude,  which  may  be  gender-dependent. 

More  generally,  a  policy  maker  may  be  interested  in  the  effect  of  issuing 
a  directive  to  a  select  set  of  subordinate  employees,  or  in  carefully  selecting 
the  routing  of  messages  in  a  network  of  interacting  agents.  Such  applications 
motivate  the  analysis  of  path-specific  effects,  that  is,  the  effect  of  X  on  Y 
through  a  selected  set  of  paths  (Avin  et  ah,  2005).  Avin  et  al.  (2005),  with  all 
other  paths  deactivated.  The  operation  of  disabling  a  path  can  be  expressed 
in  nested  counterfactual  notation,  as  in  Eqs.  (2)  and  (6). 

In  all  these  cases,  the  policy  intervention  invokes  the  selection  of  signals 
to  be  sensed,  rather  than  variables  to  be  fixed.  Pearl  (2001)  has  suggested 
therefore  that  signal  sensing  is  more  fundamental  to  the  notion  of  causation 
than  manipulation ;  the  latter  being  but  a  crude  way  of  stimulating  the  former 
in  experimental  setup.  The  mantra  “No  causation  without  manipulation” 
must  be  rejected.  (See  (Pearl,  2009a,  Section  11.4.5).) 

7Some  authors  (e.g.,  VanderWeele  (2009)  and  Vansteelandt  (2011,  Chapter  4.4)),  take 
Eq.  (11)  as  the  definition  of  indirect  effect  (see  footnote  8),  which  ensures  additivity  by 
definition,  but  presents  a  problem  of  interpretation;  the  resulting  indirect  effect,  aside  from 
being  redundant,  does  not  represent  the  same  transition,  from  x  to  x' ,  as  do  the  total  and 
direct  effects.  This  prevents  us  from  comparing  the  effect  attributable  to  mediating  paths 
with  that  attributable  to  unmediated  paths,  under  the  same  conditions. 


9 


2  The  Mediation  Formula:  A  Simple  Solution 
to  a  Thorny  Problem 

2.1  Mediation  in  non-parametric  models 

This  subsection  demonstrates  how  the  solution  provided  in  equations  (5)  and 
(9)  can  be  applied  in  assessing  mediation  effects  in  non-parametric,  possibly 
nonlinear  models.  We  will  use  the  simple  mediation  model  of  Fig.  1(a),  where 
all  error  terms  (not  shown  explicitly)  are  assumed  to  be  mutually  independent, 
with  the  understanding  that  adjustment  for  appropriate  sets  of  covariates  W 
may  be  necessary  to  achieve  this  independence  (as  in  (4)  and  (8))  and  that 
integrals  should  replace  summations  when  dealing  with  continuous  variables 
(Imai  et  ah,  2010c). 

Combining  (5),  (9),  and  (10),  the  expressions  for  the  direct  ( DE ),  indirect 
(IE)  and  total  ( TE )  effects,  IE  become: 

DEXtX,(Y )  =  J2[E(Y\x',z)-E(Y\x,z)]P(z\x)  (12) 

Z 

IEXjXi(Y)  =  YjE(Y\x,z)[P(z\x')  -  P(z\x)\  (13) 

Z 

TEX,X,(Y)  =  E(Y\x')  -  E(Y\x )  (14) 

These  three  equations  provide  general  formulas  for  mediation  effects,  appli¬ 
cable  to  any  nonlinear  system,  any  distribution,  and  any  type  of  variables. 
Moreover,  the  formulas  are  readily  estimable  by  regression.  Owed  to  their 
generality  and  ubiquity,  I  have  referred  to  these  expressions  as  the  “Mediation 
Formula”  (Pearl,  2009b). 

The  Mediation  Formula  (13)  represents  the  average  increase  in  the  outcome 
Y  that  the  transition  from  X  =  x  to  X  =  x'  is  expected  to  produce  absent 
any  direct  effect  of  X  on  Y.  Though  based  on  solid  causal  principles,  it 
embodies  no  causal  assumption  other  than  the  generic  mediation  structure  of 
Fig.  1(a).  When  the  outcome  Y  is  binary  (e.g.,  recovery,  or  hiring)  the  ratio 
(1  —  IE /TE )  represents  the  fraction  of  responding  individuals  who  owe  their 
response  to  direct  paths,  while  (1  —  DE /TE)  represents  the  fraction  who  owe 
their  response  to  Z-mediated  paths.8 

The  Mediation  Formula  tells  us  that  IE  depends  only  on  the  expectation  of 
the  counterfactual  Yxz,  not  on  its  functional  form  fy(x,  z,  Uy)  or  its  distribution 
P(YXZ  —  y).  It  calls  therefore  for  a  two-step  regression  which,  in  principle,  can 

8For  simplicity  and  clarity,  we  remove  the  subscripts  from  TE,DE,  and  IE,  whenever 
no  ubiquity  arises.  Robins  (2003)  and  Hafeman  and  Schwartz  (2009)  refer  to  TE  —  IE  and 
TE  —  DE  as  “total  direct”  and  “total  indirect”  effects,  respectively. 
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be  performed  non-parametrically.  In  the  first  step  we  regress  Y  on  X  and  Z, 
and  obtain  the  estimate 

g(x,z)  =  E(Y \x,  z ) 

for  every  ( x ,  z )  cell.  In  the  second  step  we  fix  x  and  estimate  the  conditional 
expectation  of  g(x,  z)  with  respect  to  z,  conditional  on  X  =  x'  and  X  =  x, 
respectively,  and  take  the  difference: 


IEXjX>(Y)  =  Ez\x'[g(x,z)}  -  Ez\x[g(x,z)\ 

Non-parametric  estimation  is  not  always  practical.  When  Z  consists  of 
a  vector  of  several  mediators,  the  dimensionality  of  the  problem  might  pro¬ 
hibit  the  estimation  of  E(Y\x,z )  for  every  (x,z)  cell,  and  the  need  arises  to 
use  parametric  approximation.  We  can  then  choose  any  convenient  paramet¬ 
ric  form  for  E(Y\x,z)  (e.g.,  linear,  quasi-linear  logit,  probit),  estimate  the 
parameters  separately  (e.g.,  by  regression  or  maximum  likelihood  methods), 
insert  the  parametric  approximation  into  (13)  and  estimate  its  two  conditional 
expectations  (over  z )  to  get  the  mediated  effect. 

The  power  of  the  Mediation  Formula  was  recognized  by  Petersen  et  al. 
(2006);  Glynn  (2009);  Hafeman  and  Schwartz  (2009);  Mortensen  et  al.  (2009); 
VanderWeele  (2009);  Kaufman  (2010);  Irnai  et  al.  (2010c).  Imaiet  al.  (2010a, c) 
have  further  shown  that  nonparametric  identification  of  mediation  effects  un¬ 
der  the  no-confounding  assumption  (Fig.  la)  allows  for  a  flexible  estimation 
strategy  and  illustrate  this  with  various  nonlinear  models,  quantile  regressions, 
and  generalized  additive  models.  Imai  et  al.  (2010b)  describe  an  implemen¬ 
tation  of  these  extensions  using  a  convenient  R  package.  Sjolander  (2009) 
provides  bound  on  DE  in  cases  where  the  confounders  between  Z  and  Y  can¬ 
not  be  controlled. 

In  the  next  section  this  power  will  be  demonstrated  on  linear  and  nonlinear 
models,  with  the  aim  of  explaining  the  distortions  produced  by  conventional 
methods  of  parametric  mediation  analysis,  and  how  they  are  rectified  through 
the  Mediation  Formula. 
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2.2  Mediation  effects  in  linear,  logistic,  and  probit  mod¬ 
els 

The  linear  case:  Difference  versus  product  estimation 

Let  us  examine  what  the  Mediation  Formula  yields  when  applied  to  the  linear 
version  of  model  1(a),  which  reads: 

x  =  Ux 
z  =  7 xzX  +  Uz 
y  =  7 xyX  +  7  zyZ  +  UY 

Computing  the  conditional  expectation  in  (13)  gives 

g(x,  z )  =  E(Y\x,  z )  =  E(^xyx  +  7 Zyz  +  uY)  =  a0  +  7 Xyx  +  7 Zyz 
and  yields 

DEx>xi  =  ^[(a0  +  7 Xyx'  +  7 Zyz)  -  (a0  +  jxyx  +  jzyz)]P(z\x) 

Z 

=  7 xyW  ~  X ) 

IEXjX>(Y)  =  ^(a0  +  jxyx  +  jzyz)[P(z\x')  -  P(z\x)]. 

Z 

=  lzy[E{Z\x’)-E{Z\x)} 

=  {lzylxz){xf  ~  X) 

(yfixy  7 xy)(x  X ) 

where  /3xy  is  the  regression  coefficient  (3xy  =  J^E(Y\x)  =  7 xy  +  7^7 zy 

TEx^(Y)  =  (E(Y\x')-E(Y\x )) 

=  E(Y\x',  z)P{z\x')  +  E(Y\x,  z)P{z\x) 

z  z 

=  ^(a0  +  7 zyx'  +  ryzyz)P(z\x')  -  7>0  +  7x2/2:  +  72j/^)P(^|a:) 

Z  2: 

=  Ixyix'  -  x)  +  ^zyE(Z\x')  -  7 zyE(Z\x ) 

(7x2/  d"  rizylxz){x  X )  (19) 

We  thus  obtained  the  standard  expressions  for  effects  in  linear  systems. 
In  particular,  we  see  that  the  indirect  effect  can  be  estimated  either  as  a  dif¬ 
ference  in  two  regression  coefficients  (Eq.  18)  or  a  product  of  two  regression 


(16) 

(17) 

(18) 
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coefficients  (Eq.  17),  with  Y  regressed  on  both  X  and  Z ,9  When  generalized 
to  nonlinear  systems,  however,  these  two  strategies  yield  conflicting  results 
(MacKinnon  and  Dwyer,  1993;  MacKinnon  et  ah,  2007b)  and  much  contro¬ 
versy  has  developed  as  to  which  strategy  should  be  used  in  assessing  the  size 
of  mediation  effects  (MacKinnon  and  Dwyer,  1993;  Freedman  et  ah,  1992; 
Molenberghs  et  ah,  2002;  MacKinnon  et  ah,  2007b;  Glynn,  2009;  Green  et  ah, 
2010). 

We  now  show  that  neither  of  these  strategies  generalizes  to  nonlinear  sys¬ 
tems;  direct  application  of  (13)  is  necessary.  Moreover,  we  will  see  that,  though 
yielding  identical  results  in  linear  systems,  the  two  strategies  represent  legit¬ 
imate  intuitions  in  pursuits  of  two  distinct  causal  quantities.  The  difference- 
in-coefficients  method  seeks  to  estimate  TE  —  DE ,  while  the  product-of- 
coefficients  method  seeks  to  estimate  IE.  The  former  represents  the  reduction 
in  TE  if  indirect  paths  were  deactivated,  while  the  latter  represents  the  por¬ 
tion  of  TE  that  would  remain  if  the  direct  path  were  deactivated.  The  choice 
between  TE  —  DE  and  IE  depends  of  course  on  the  specific  decision  mak¬ 
ing  objectives  that  the  study  aims  to  inform.  If  the  policy  evaluated  aims 
to  prevent  the  outcome  Y  by  ways  of  manipulating  the  mediating  pathways, 
the  target  of  analysis  should  be  the  difference  TE  —  DE ,  which  measures  the 
highest  prevention  effect  of  any  such  manipulation.  If,  on  the  other  hand,  the 
policy  aims  to  prevent  the  outcome  by  manipulating  the  direct  pathway,  the 
target  of  analysis  should  shift  IE,  for  TE  —  IE  measures  the  highest  preventive 
impact  of  this  type  of  manipulations. 

In  the  hiring  discrimination  example,  TE  —  DE  gives  the  maximum  reduc¬ 
tion  in  racial  earning  disparity  that  can  be  expected  from  programs  aiming  to 
achieve  educational  parity.  TE—IE  on  the  other  hand  measures  the  max¬ 
imum  reduction  in  earning  disparity  that  can  be  expected  from  eliminating 
hiring  discrimination  by  employers.  The  difference-in-coefficients  strategy  is 
motivated  by  the  former  types  of  problems  while  the  product-of-coefficients  by 
the  latter. 

9Note  that  the  equality  (3xy  —  7 xy  =  'jxz'Yzy  will  continue  to  hold  (in  linear  models)  even 
when  the  error  terms  are  not  independent  (as  in  Fig.  1(b))  since  these  are  structural  parame¬ 
ters.  In  comparison,  the  regressional  version  of  this  equation,  Ryx  —  Ryxz  =  RzxRyzx , 
which  is  often  used  in  mediations  analysis,  is  a  universal  identity  among  regressional  co¬ 
efficients  of  any  three  variables,  and  has  nothing  to  do  with  causation  or  mediation.  It 
will  continue  to  hold  therefore  regardless  of  whether  confounders  are  present,  whether  the 
structural  parameters  are  identifiable,  whether  the  underlying  model  is  linear  or  nonlinear 
and  regardless  of  whether  the  arrows  in  the  model  of  Fig.  1(a)  point  in  the  right  direction. 
Moreover,  the  equality  will  hold  among  the  OLS  estimates  of  these  parameters,  regardless 
of  sample  size.  Therefore,  the  failure  of  parameters  in  nonlinear  regression  to  obey  similar 
equalities  should  not  be  construed  as  an  indication  of  faulty  standardization,  as  suggested 
by  (MacKinnon  et  al.,  2007a, b). 
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The  next  section  illustrates  how  nonlinearities  bring  about  the  disparity 
between  IE  and  TE  —  DE. 


The  logistic  case 

To  see  how  the  Mediation  Formula  facilitates  nonlinear  analysis,  let  us  consider 
the  logistic  and  probit  models  treated  in  MacKinnon  et  al.  (2007b). 10  To  this 
end,  let  us  retain  the  linear  model  of  (15)  with  one  modification:  the  outcome 
of  interest  will  be  a  threshold-based  indicator  of  the  linear  outcome  Y  in  (15). 
In  other  words,  we  regard 

Y*  =  7 xyX  +  7 zyZ  +  uY  (20) 

as  a  latent  variable,  and  define  the  outcome  Y  as 


Y 


1  if  7o  +  7 xyX  +  7 Zyz  +  Uy  >  0 
0  otherwise 


(21) 


where  70  is  some  unknown  threshold  level.  We  will  assume  that  the  error  Uy 
is  governed  by  the  logistic  distribution 

P(Uy  <  u)  =  L(u)  t  -Y—  (22) 

and,  consequently,  E(Y\x,  z )  attains  the  form: 

E(Y\x,  z )  =  - - - - - y  (23) 

V  1  ’  1  -f  g  —  (70  +lxyX+lzy  z)  V  ' 

=  L(  7o  +  7 Xyx  +  7 Zyz)  (24) 

We  will  further  assume  that  Uz  is  normal  with  zero  mean  and  infinitesimal 
variance  <7?  <<  1. 

Given  this  logistic  model  and  its  parameter  set  (70,  ^xy,  'Jzyj'jxz,  we  will 
now  compute  the  direct  (DE),  indirect  (IE)  and  total  (TE)  effects  associated 
with  the  transition  from  X  =  0  to  X  =  1.  From  the  Mediation  Formula 

10Pearl  (2010)  analyzes  Boolean  models  with  Bernoulli  noise. 
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(25) 


((12)— (14)),  we  obtain: 


DE  = 


[L( 70  +  7 xy  +  7*1/ )  -  ^(7o  +  lzyz)]fz\x{z\X  =  o )dz 


—  L{  7o  +  7a:,/)  —  L{  70)  +  0(<r?) 


IE  = 


[L{ 7o  +  7^-)[/z|.y(-|A"  =  1)  -  /z|.y(-|A"  =  0)]dz 

«/  z=—oo 

=  L( 7o  +  7*y7**)  -  l(7o)  +  0(c7;)  (26) 

/»oo 

TE  =  E(Y\X  =  1)  -  £(K|X  =  0)  =  /  £(K|X  =  l,z)/z|x(*|*  =  1  )dz 


-  /  £?(y|x  =  o^)/z|*(^|A;  =  o)^ 

J  z= oo 

=  L(7o  +  7xj,x  +  7^2)  -  L( 7o)  +  O(cr^) 


(27) 


where  O(cr^)  — >  0  as  az  — >  0. 

It  is  clear  that,  due  to  the  nonlinear  nature  of  L(u),  none  of  these  effects 
coincides  with  its  corresponding  effect  in  the  linear  model  of  Eq.  (15).  In  other 
words,  it  would  be  wrong  to  assert  the  equalities: 


DEq  \  7 xy 

IE0,1  Izylxz 
TEfl  i  7 xy  T  7 zy^/xz  Pxy 

as  is  normally  assumed  in  the  mediation  literature  (Prentice,  1989;  Freedman 
et  ah,  1992;  MacKinnon  and  Dwyer,  1993;  Fleming  and  DeMets,  1996;  Molen- 
berghs  et  ah,  2002;  MacKinnon  et  ah,  2007b).  In  particular,  the  mediated 
fractions  1  —  DE/TE,  and  IE /TE  may  differ  substantially  from  the  fractions 
Ixzlzylilxy  +  Ixzjzy),  1  -  7 xy/Pxy,  or  7^7^  / (3 xy  that  have  been  proposed  to 
evaluate  mediation  effects  by  traditional  methods.  The  latters  are  heuristic 
ratios  informed  by  the  linear  portion  of  the  model,  while  the  formers  are  de¬ 
rived  formally  from  the  counterfactual  specifications  of  the  target  quantities, 
as  in  (2)  and  (6). 

Figure  2  depicts  DE,IE ,  and  TE  as  a  function  of  70,  the  threshold  coef¬ 
ficient  that  dichotomizes  the  outcome  (as  in  Eq.  (21)).  These  were  obtained 
analytically,  from  Eqs.  (25)-(27),  using  the  values  jxz  =  jxy  =  7 zy  =  0.5  for 
illustrative  purposes.  We  see  that  all  three  measures  vary  with  70  and  deviate 
substantially  from  the  assumptions  that  equate  DE  with  jxy  =  0.50,  IE  with 
'Jxz'Izy  =  0.25  and  TE  with  jxy  +  'yxz'yzy  =  0.75  (MacKinnon  and  Dwyer,  1993; 
MacKinnon  et  ah,  2007b). 

The  bias  produced  by  such  assumptions  is  further  accentuated  in  Fig. 
3,  which  compares  several  fractions  (or  proportions)  proposed  to  measure 
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Figure  2:  Direct  ( DE ),  indirect  (IE)  and  total  ( TE )  effects  for  the  logistic 
model  of  Eq.  (24)  as  a  function  of  the  threshold  70  that  dichotomizes  the 
outcome. 

the  relative  contribution  of  mediation  to  the  observed  response.  Recall  that 
1  —  DE/TE  measures  the  extent  to  which  mediation  was  necessary  for  the  ob¬ 
served  response,  while  IE/TE  the  extent  to  which  it  was  sufficient.  Figure  3 
shows  that  the  necessary  fraction  (1  —  DE/TE)  exceeds  the  sufficient  fraction 
(IE/TE)  as  70  becomes  more  negative.  Indeed,  in  this  region,  both  direct 
and  indirect  paths  need  be  activated  for  Y*  to  exceed  the  threshold  of  Eq. 
(22).  Therefore,  the  fraction  of  responses  for  which  mediation  was  necessary  is 
high  and  the  fraction  for  which  mediation  was  sufficient  is  low.  The  disparity 
between  the  two  will  be  revealed  by  varying  the  intercept  70,  a  parameter  that 
is  hardly  paid  noticed  to  in  traditional  analyses  and  which  will  be  shown  to 
be  important  for  understanding  the  interplay  between  DE  and  IE,  and  the 
role  they  play  in  shaping  mediated  effects.  The  opposite  occurs  for  positive 
70  (negative  threshold),  where  each  path  alone  is  sufficient  for  activating  Y 
and  it  is  unlikely  therefore  that  the  mediator  becomes  a  necessary  enabler  of 
Y  =  1. 

None  of  this  dynamics  is  represented  in  the  fixed  fraction  ^xzlzyKlxy  + 
Ixz^zy)  =  0.25/0.75  =  1/3  which  standard  logistic  regression  would  report 
as  the  fraction  of  cases  “explained  by  mediation.”  Some  of  this  dynamics  is 
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Figure  3:  Necessary  (1  —  DE/TE)  and  sufficient  ( IE/TE )  mediation  propor¬ 
tions  for  the  logistic  model  of  Eq.  (24) 

reflected  in  the  fraction  ryxzryzy/[E(Y\X  =  1)  —  E(Y\X  =  0)]  =  0.25 /TE  (not 
shown  in  Fig.  3)  which  some  researchers  have  recommended  as  a  measure  of 
mediation  (MacKinnon  and  Dwyer,  1993;  MacKinnon  et  ah,  2007b).  But  this 
measure  is  totally  incompatible  with  the  correct  fractions  shown  in  Fig.  3.  The 
differences  are  accentuated  again  for  negative  70  (positive  threshold),  where 
both  direct  and  indirect  processes  must  be  activated  for  Y*  to  exceed  the 
threshold,  and  the  fraction  of  responses  for  which  mediation  is  necessary  (1  — 
DE/TE )  is  high  and  the  fraction  for  which  mediation  is  sufficient  (IE/TE) 
is  low. 

The  probit  case 

Figure  4  displays  the  behavior  of  a  probit  model.  It  was  computed  analyti¬ 
cally  by  assuming  a  probit  distribution  in  Eq.  (22),  which  leads  to  the  same 
expressions  in  (21)-(24),  with  <f>  replacing  L.  Noticeably,  Figs.  4  and  5  reflect 
more  pronounced  variations  of  all  effects  with  70,  as  well  as  more  pronounced 
deviation  of  these  curve  from  the  constant  ^xz^zy  /  (ixy  +  Ixz^zy)  =  1/3  that 
regression  analysis  defines  as  the  “proportion  mediated”  measure  (Sjolander, 
2009).  We  speculate  that  the  difference  in  behavior  between  the  logistic  and 
probit  models  is  due  to  the  latter  sharper  approach  toward  the  asymptotic 
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Figure  4:  Direct  ( DE ),  indirect  (IE)  and  total  (TE)  effects  for  a  probit  model. 


limits. 


2.3  Special  cases  of  mediation  models 

In  this  section  we  will  discuss  three  special  cases  of  mediation  processes  that 
lend  themselves  to  simplified  analysis, 


Incremental  causal  effects 


Consider  again  the  logistic  threshold  model  of  Eq.  (21),  and  assume  we  are 
interested  in  assessing  the  response  Y  to  an  incremental  change  in  the  treat¬ 
ment  variable  A",  say  from  X  =  x  to  X  =  x  +  S.  In  other  words,  our  target 


o 

o 

T 

DEinc  (x) 

=  -dex,x+5 
0 

I E 'inc(^) 

~^I Ex  x+8 

0 

TEinc{x) 

=  -zTExx+( 5 
0 
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Figure  5:  Necessary  (1  —  DE/TE)  and  sufficient  ( IE/TE )  mediation  propor¬ 
tions  for  a  probit  model. 


If  we  maintain  the  infinitesimal  variance  assumption  a2  <<  1,  we  obtain: 

-D.£/jnc(x)  liTYis^o—DEx  x+s 
o 

=  lim5^ j  [E(Y\x  +  5,z)  —  E(Y\x,  z)]fz\x(z\x)dz 

=  ^E(Y\X’  z)U=h(x)  +  0 (a2) 


where  h(x)  =  E(Zjx). 
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Similarly,  we  have 

^inc('l')  lim  —  I  EXjX-\-5 

<5^0  0 

=  lim  \  [ E(Y lx,  z)[f(z\x  +  S)  —  f(z\x)]dz 
6^0  d  Jz 

=  lim  -E(Y\x,  z)\z=h(x+s)  ~  E(Y\x,  z) \z=h(x) 

d— >0  0 

=  lim \[E{Y \x,  h(x  +  5))  -  E(Y \x,  h(x))]  +  0(er?) 

6—> 0  0 

d  d 

=  -Q~zE{y\x'  z)-^h(x)\z=h(X) 

and 

TEinc(x)  =  lim ^[E(Y \x  +  5)-  E{Y\x )}  =  -^E(Y\x) 

Using  the  rule  of  partial  differentiation,  we  have  TEinc  =  DEinc  +  IEinc  a 
result  obtained  in  (Winship  and  Mare,  1983),  though  starting  from  a  different 
perspective. 

Linear  outcome  with  binary  mediator 

It  is  interesting  to  inquire  how  effects  are  decomposed  when  we  retain  the 
linear  form  of  the  outcome  process,  but  let  the  intermediate  variable  Z  be  a 
binary  variable  that  is  related  to  X  through  an  arbitrary  nonlinear  process 
P(Z  =  l|x). 

Considering  a  transition  from  X  =  Xo  to  X  =  X\  and  writing 

E(Y\x,  z)  =  ax  +  /3z  +  7, 

we  readily  obtain: 

DE  =  ^[(axi  +  f3z  +  7)]  -  [{ax 0  +  (5z  +  y)\P(z\x0) 

Z 

=  a{x\  —  xo)  (28) 

IE  =  y^(o:xi  +  /3z  +  7)  -  [P(z\xi)  -  P{z\xQ)] 

Z 

=  P[E(Z0\Xl)  -  E(Z\x0)]  (29) 

TE  =  Y^  E(Y\Xi,  z)P(z\x1)  -  E(Y\x0,  z)P(z\x0) 

Z 

=  ^(ar  1  +  (3z  +  r))P{z\x\)  -  ^(ax0  +  (3z  +  'y)P{z\x0) 

Z  Z 

=  a(x  1  -  x0)  +  fi[E(Z\xi)  -  E{Z\xq)]  (30) 
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Again,  we  have  TE  =  DE  +  IE. 

We  see  that  as  long  as  the  outcome  processes  is  linear,  non  linearities  in  the 
mediation  process  do  not  introduce  any  surprises;  effects  are  decomposed  into 
their  direct  and  indirect  components  in  a  textbook-like  fashion.  Moreover,  the 
distribution  P(Z\x)  plays  no  role  in  the  analysis;  it  is  only  the  expectation 
E(Z\x)  that  need  be  estimated. 

This  result  was  obtained  by  Li  et  al.  (2007)  who  estimated  IE  using  a 
difference-in-coefficients  strategy.  It  follows,  in  fact,  from  a  more  general  prop¬ 
erty  of  the  Mediation  Formula  (13),  first  noted  by  VanderWeele  (2009),  which 
will  be  discussed  next. 

Semi-linear  outcome  process 

Suppose  E(Y\x,  z)  is  linear  in  Z,  but  not  necessarily  in  X.  We  can  then  write 

E(Y\x,  z)  =  g{x ,  z)  =  f(x)  +  t(x)z 
E(Z\x)  =  h(x) 

and  the  Mediation  Formulas  give  (for  the  transition  from  X  =  xq  to  X  =  x\): 
DE  =  ^2{{f(x i)  +  t(xi)z)  -  (f(x 0)  -  t{x0)z)}P{z\x0 ) 

Z 

=  f(x i)  -  f(x o)  +  (t(x i)  -  t(x0))E(Z\x0) 

=  g(xi,h(x0))  -  g(x0,h(x0))  (31) 

IE  =  y^(/(ar0)  +  t(x0)z)(P(z\x1)  -  P{z\x0)) 

Z 

=  t{x0){E{Z\x1)-E{Z\x0)) 

=  t(x0)[h(x i)  -  h(x0)]  (32) 

TE  =  y^(/(gj)  +  t(x1)z)P(z\x1)  -  y^(f(x0)  +  t(x0)z)P(z\x0) 

Z  Z 

=  f(x i)  -  f(x o)  +  t(x1)E(Z\xi)  -  t(x0)E(Z\x0 ) 

=  g(xi,h(xi))  -g(x0,h(x0))  (33) 

We  see  again  that  only  the  conditional  mean  E(Z\x)  need  enter  the  es¬ 
timation  of  causal  effects  in  this  model,  not  the  entire  distribution  P(z\x). 
However,  the  equality  TE  =  DE  +  IE  no  longer  holds  in  this  case;  the  non- 
linearities  embedded  in  the  interaction  term  t(x)z  may  render  Z  an  enabler  or 
inhibitor  of  the  direct  path,  thus  violating  the  additive  relationship  between 
the  three  effect  measures. 
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This  becomes  more  transparent  when  we  examine  the  standard  linear 
model  to  which  a  multiplicative  term  xz  is  added,  as  is  done,  for  example, 
in  the  analyses  of  Kraemer  et  al.  (2008),  Jo  (2008),  and  Preacher  et  al.  (2007). 
In  this  model  we  have 

E(Y \x,  z )  =  g(x,  z))  =  fo  +  fox  +  foz  +  foxz 
E(Z\x )  =  h(x)  =  70  +  7ix 

Substituting 

f(x)  =  fo  +  P  ix 
t(x)  =  +  P& 

h{x)  =  70  +  7ix 

in  (31)-(33)  and  letting  x\  —  xq  =  1,  gives 


DE  =  fo  +  fofoo  +  7ix0) 

(34) 

IE  =  7i(/?2  +  fox0) 

(35) 

TE  =  fo  +  fo-f0  +  /?27i  +  foli{x0  +  xi) 

(36) 

In  particular  the  relationships  between  DE,  IE ,  and  TE  becomes 

TE  =  DE  +  IE  +  7i/?3 

which  clearly  identifies  the  product  ^ifo  as  the  culprit  for  the  non-additivity 
TE  ^  TE  +  IE.  Indeed,  when  ^ifo  ^  0,  Z  acts  both  as  a  moderator  and  a 
mediator,  and  both  DE  and  IE  are  affected  by  the  interaction  term  foxz.  Note 
further  that  the  direct  and  indirect  effects  can  both  be  zero  and  the  total  effect 
non-zero;  a  familiar  nonlinear  phenomenon  that  occurs  when  Z  is  a  necessary 
enabler  for  the  effect  of  X  on  Y .  This  dynamics  has  escaped  standard  analyses 
of  mediation  which  focused  exclusively  on  estimating?  structural  parameters, 
rather  than  effect  measures,  as  in  (34)-(36). 

It  is  interesting  to  note  that,  due  to  interaction,  a  direct  effect  can  exist  even 
when  fo  vanishes,  though  fo  is  the  path  coefficient  associated  with  the  direct 
link  X  Y .  This  illustrates  that  estimating  parameters  in  isolation  tells  us 
little  about  the  problem  until  we  understand  the  way  they  combine  to  form 
effect  measures.  More  generally,  mediation  and  moderation  are  inextricably 
intertwined  and  cannot  be  assessed  separately,  a  position  affirmed  by  Kraemer 
et  al.  (2008)  and  Preacher  et  al.  (2007). 
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The  binary  case 

To  complete  our  discussion  of  models  in  which  the  mediation  problem  lends 
itself  to  a  simple  solution,  we  now  address  the  case  where  all  variables  are 
binary,  still  allowing  though  for  arbitrary  interactions  and  arbitrary  distribu¬ 
tions  of  all  processes.  The  low  dimensionality  of  the  binary  case  permits  both 
a  nonparametric  solution  and  an  explicit  demonstration  of  how  mediation  can 
be  estimated  directly  from  the  data.  Generalizations  to  multi-valued  outcomes 
are  straightforward. 

Assume  that  the  model  of  Fig.  1(a)  is  valid  and  that  the  observed  data 
is  given  by  Table  1.  The  factors  E(Y\x,  z)  and  P(Z\x )  in  Eqs.  (12)-(14)  can 


X 

Z 

V 

E(Y \x,  z)  =  gxz 

E(Z\x)  =  hx 

ri\ 
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0 
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n  4 

0 

1 

1 

n3+n4  yui 

n5 
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nr+ns  _  ^ 

ne 

1 

0 

1 
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n7 

1 

1 

0 

718  —mi 

n5+n6+n7+n8  ^ 
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1 

1 

1 

n7+n8  V 11 

Table  1:  Computing  the  Mediation  Formula. 

be  readily  estimated  as  shown  in  the  two  right-most  columns  of  Table  1  and, 
when  substituted  in  (12)-(14),  yield: 

DE  =  (gw  —  goo) (1  _  ^o)  +  ( 9u  —  9oi)ho  (37) 

IE  =  (hi  —  ho)(goi  —  goo)  (38) 

TE  =  gnhi  +  gio(l  —  hi)  —  [goiho  +  goo(l  —  ho)]  (39) 

We  see  that  logistic  or  probit  regression  is  not  necessary,  simple  arithmetic 

operations  suffice  to  provide  a  general  solution  for  any  conceivable  dataset. 

Numerical  example 

To  anchor  these  formulas  in  a  concrete  example,  let  us  assume  that  X  =  1 
stands  for  a  drug  treatment,  Y  =  1  for  recovery,  and  Z  —  1  for  the  presence 
of  a  certain  enzyme  in  a  patient’s  blood  which  appears  to  be  stimulated  by 
the  treatment.  Assume  further  that  the  data  described  in  Tables  2  and  3 
was  obtained  in  a  randomized  clinical  trial  and  that  our  research  question 
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is  whether  Z  mediates  the  action  of  X  on  Y,  or  is  merely  a  catalyst  that 
accelerates  the  action  of  X  on  Y . 


Treatment 

X 

Enzyme  present 
Z 

Percentage  cured 
9xz  =  E(Y\x,z) 

YES 

YES 

gi  1  =  80% 

YES 

NO 

gi  o  =  40% 

NO 

YES 

goi  =  30% 

NO 

NO 

g00  =  20% 

Table  2: 


Treatment 

Percentage  with 

X 

Z  present 

NO 

h0  =  40% 

YES 

hi  =  75% 

Table  3: 


Substituting  this  data  into  Eqs.  (37)-(39)  yields: 

DE  =  (0.40  -  0.20)(1  -  0.40)  +  (0.80  -  0.30)0.40  =  0.32 

IE  =  (0.75  -  0.40)(0.30  -  0.20)  =  0.035 

TE  =  0.80  x  0.75  +  0.40  x  0.25  -  (0.30  x  0.40  +  0.20  x  0.60)  =  0.46 
IE/TE  =  0.07  DE/TE  =  0.696  1  —  DE/TE  =  0.304 

We  conclude  that  30.4%  of  those  recovered  owe  their  recovery  to  the  capacity 
of  the  treatment  to  stimulate  the  secretion  of  the  enzyme,  while  only  7%  of 
recoveries  would  be  sustained  by  enzyme  stimulation  alone.  The  enzyme  seems 
to  act  more  as  a  catalyst  for  the  healing  process  of  X  than  having  a  healing 
action  of  its  own.  The  policy  implication  of  such  a  study  would  be  that  efforts 
to  substitute  the  drug  with  an  alternative  stimulant  of  the  enzyme  are  not 
likely  to  be  effective;  the  drug  evidently  has  a  beneficial  effect  on  recovery 
that  is  independent  of,  though  enhanced  by  enzyme  stimulation. 

For  completeness,  we  note  that  the  controlled  direct  effects  are  (using  (1)): 


and 


CDEZ= o  —  g\o  —  goo  —  0.40  —  0.20  —  0.20 
CDEZ= i  =  gn  -  g0i  =  0.80  -  0.30  =  0.50 
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which  are  quite  far  apart.  Their  weighted  average,  governed  by  P(Z  =  1\X  = 
0)  =  ho  =  0.40,  gives  us  DE  =  0.32.  These  do  not  enter,  however,  into 
the  calculation  of  IE,  since  the  indirect  effect  cannot  be  based  on  controlling 
variables;  it  requires  instead  a  path-deactivating  operator,  as  mirrored  in  the 
definition  of  Eq.  (6) . 

3  Relation  to  Other  Methods 

3.1  Methods  based  on  differences  and  products 

Attempts  to  compare  these  results  to  those  produced  by  conventional  medi¬ 
ation  analyses  encounter  two  obstacles.  First,  conventional  methods  do  not 
define  direct  and  indirect  effects  in  causal  vocabulary,  without  committing  to 
specific  functional  or  distributional  forms.  MacKinnon  (2008,  Ch.  11),  for 
example,  analyzes  categorical  data  using  logistic  and  probit  regressions  and 
constructs  effect  measures  using  products  and  differences  of  the  parameters 
in  those  regressional  forms.  Section  2  demonstrates  that  this  strategy  is  not 
compatible  with  the  causal  interpretation  of  effect  measures,  even  when  the 
parameters  are  known  precisely;  IE  and  DE  may  be  extremely  complicated 
functions  of  those  regression  coefficients  (see  Eqs.  (25-26)).  Fortunately,  those 
coefficients  need  not  be  estimated  at  all;  effect  measures  can  be  estimated  di¬ 
rectly  from  the  data,  circumventing  the  parametric  representation  altogether. 

Second,  attempts  to  extend  the  difference  and  product  heuristics  to  non- 
parametric  analysis  have  encountered  ambiguities  that  conventional  analysis 
fails  to  resolve.  The  product-of-coefficients  heuristic  advises  us  to  multiply  the 
slope  of  Z  on  X 


Cp  =  E(Z \X  =  1)  -  E(Z\X  =  0)  =  hx  -  ho 

by  the  slope  of  Y  on  Z  fixing  X , 

C7  =  E{Y\X  =  x,  Z  =  1)  -  E(Y \X  =  x,Z  =  0)  =  gxl- gx0 

but  does  not  specify  at  what  value  we  should  fix  X.  Equation  (38)  resolves 
this  ambiguity  by  determining  that  X  should  be  fixed  to  X  =  0;  only  then 
would  the  product  CpC1  yield  the  correct  mediation  measure,  IE. 

The  difference-in-coefficients  heuristics  instructs  us  to  estimate  the  direct 
effect  coefficient 

Ca  =  E(Y \X  =  1  ,Z  =  z)~  E(Y \X  =  0,  Z  =  z)  =  glz  -  g0z 
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and  subtract  it  from  the  total  effect,  but  does  not  specify  on  what  value  we 
should  condition  Z .  Equation  (37)  determines  that  the  correct  way  of  esti¬ 
mating  Ca  would  be  to  condition  on  both  Z  —  0  and  Z  —  1  and  take  their 
weighted  average,  with  ho  =  P(Z  —  1\X  —  0)  as  the  weighting  function. 

To  summarize,  in  calculating  IE,  we  should  condition  on  both  Z  =  1  and 
Z  —  0  and  average  while,  in  calculating  DE,  we  should  condition  on  only  one 
value,  A"  =  0,  and  no  average  need  be  taken. 

Reiterating  the  discussion  of  Section  2,  the  difference  and  product  heuris¬ 
tics  are  both  legitimate,  with  each  seeking  a  different  effect  measure.  The 
difference-in-coefficients  heuristics,  leading  to  TE  —  DE,  seeks  to  measure 
the  percentage  of  units  for  which  mediation  was  necessary.  The  product-of- 
coefhcients  heuristics  on  the  other  hand,  leading  to  IE,  seeks  to  estimate  the 
percentage  of  units  for  which  mediation  was  sufficient.  The  former  informs 
policies  aiming  to  modify  the  direct  pathway  while  the  latter  informs  those 
aiming  to  modify  mediating  pathways. 

The  ability  of  the  Mediation  Formula  to  move  across  the  linear-nonlinear 
barrier  may  suggest  that  all  mediation-related  questions  can  now  be  answered 
nonparametrically  and,  more  specifically,  that,  similar  to  traditional  path  anal¬ 
ysis  in  linear  systems  (Alwin  and  Hauser,  1975;  Bollcn,  1989),  we  can  now 
assess  the  extent  to  which  an  effect  is  mediated  through  ANY  chosen  path  or 
a  bundle  of  paths  in  a  causal  diagram.  This  turned  out  not  to  be  the  case. 
Avin  et  al.  (2005)  showed  that  there  are  bundles  of  paths  (i.e.,  subgraphs)  in 
a  graph  whose  mediation  effects  cannot  be  assessed  from  either  observational 
or  experimental  studies,  even  in  the  absence  of  unobserved  confounders.  They 
proved  that  the  effect  mediated  by  a  subgraph  SG  is  estimable  if  and  only  if 
SG  contains  no  “broken  fork,”  that  is,  a  path  p\  from  X  to  some  W,  and  two 
paths,  P2  and  po,  from  W  to  Y,  such  that  pi  and  P2  are  in  SG  and  po  is  in 
G  but  not  in  SG.  Such  subgraphs  may  exist  if  any  of  the  mediator-outcome 
confounders  (W2  in  Fig.  1)  is  a  descendant  of  X. 

3.2  Relation  to  Principal-Strata  Direct  Effect 

The  derivation  of  the  Mediation  Formula  (Pearl,  2001)  was  made  possible 
by  the  counterfactual  interpretation  of  structural  equations  (see  footnote  1) 
and  the  symbiosis  between  graphical  and  counterfactual  analysis  that  this 
interpretation  engenders.11  In  contrast,  the  structure-less  approach  of  Rubin 

11Such  symbiosis  is  now  standard  in  epidemiology  research  (Robins,  2001;  Petersen  et  al., 
2006;  VanderWeele  and  Robins,  2007;  Hafeman  and  Schwartz,  2009;  VanderWeele,  2009; 
Albert  and  Nelson,  2011)  and  is  making  its  way  slowly  toward  the  social  and  behavioral 
sciences,  (e.g.,  Morgan  and  Winship  (2007);  Imai  et  al.  (2010a, c);  Elwert  and  Winship 
(2010);  Chalak  and  White  (2011)),  despite  islands  of  resistance  (Wilkinson  et  al.,  1999,  p. 
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(1974)  has  spawned  other  definitions  of  direct  effects,  normally  referred  to  as 
“principal-strata  direct  effect  (PSDE)”  (Frangakis  and  Rubin,  2002;  Mcalli  and 
Rubin,  2003;  Rubin,  2004,  2005;  Egleston  et  ah,  2010).  Whereas  the  natural 
direct  effect  measures  the  average  effect  that  would  be  transmitted  in  the 
population  with  all  mediating  paths  (hypothetically)  deactivated ,  the  PSDE  is 
defined  as  the  effect  transmitted  in  those  units  only  for  whom  mediating  paths 
happened  to  be  deactivated  in  the  study.  This  definition  leads  to  unintended 
results  that  stand  contrary  to  common  usage  of  direct  effects  (Robins  et  ah, 
2007,  2009;  VanderWeele,  2008),  excluding  from  the  analysis  all  individuals 
who  are  both  directly  and  indirectly  affected  by  the  causal  variable  X  (Pearl, 
2009b).  In  linear  models,  as  a  striking  example,  a  direct  effect  will  be  flatly 
undefined,  unless  /3,  the  X  — >  Z  coefficient  is  zero.  In  some  other  cases,  the 
direct  effect  of  the  treatment  will  be  deemed  to  be  nil,  if  a  small  subpopulation 
exists  for  which  treatment  has  no  effect  on  both  Y  and  Z. 

To  witness  what  the  PSDE  estimates  in  the  example  of  Table  1,  we  should 
note  that  the  PSDE ,  like  the  natural  direct  effect,  also  computes  a  weighted 
average  of  the  two  controlled  direct  effects,12 

PSDE  =  aCDEz=0  +  (1  -  a)CDEz=1.  (40) 

However,  the  weight,  a,  is  not  identifiable,  and  may  range  from  zero  to  one 
depending  on  unobserved  factors.  If  in  addition  to  non-confoundedness  we 
also  assume  monotonicity  (i.e.,  Z\  >  Z0),  a  is  identified  and  is  given  by  the 
relative  sizes  of  two  extreme  ends  of  the  population; 

P(Z  =  111  =  0) 

01  ~  P(Z  =  1\X  =  0)  +  P(Z  =  0\X  =  1)  ■ 

For  example,  if  P(Z  —  1\X  —  0)  =  0.10  and  P(Z  =  0|X  =  1)  =  0.01  90%  of 
the  weight  will  be  placed  on  CDEZ= o  and  the  PSDE  will  be  close  to  CDEZ= q. 

600;  Sobel,  2008;  Rubin,  2010;  Imbens,  2010). 

12We  take  here  the  definition  used  in  Gallop  et  al.  (2009)  PSDE  =  E\Y\  —  Y0\Z0  =  Z{\ 
and,  invoking  non-confoundedness,  YXiZJLZx,  together  with  composition  Yx  =  Yx,zx,  we 
obtain 

PSDE  =  Y E(YltZl  -  Yotz0\Zi  =  Z0  =  z)P{Z1  =  z\Z0  =  Zx) 

Z 

=  YE(Yl>*  -  =  Z°  =  z)p (^i  =  z\Zo  =  zi) 

Z 

=  Y  E(Yl’*  -  Yo,z)P(Zi  =  z\Z0  =  Zi) 

z 

which  reduces  to  Eq.  (40),  with  a  =  P{Z\  —  1| Z0  =  Z\). 
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If,  on  the  other  hand,  P(Z  =  1\X  =  0)  =  0.01  and  P(Z  =  0|A~  =  1)  =  0.10 
the  PSDE  will  be  close  to  CDEZ=\.  Such  sensitivity  to  the  exceptional  cases 
in  the  population  is  not  what  we  usually  expect  from  direct  effects. 

In  view  of  these  definitional  inadequacies  we  do  not  include  “principal- 
strata  direct  effect”  in  our  discussion  of  mediation,  though  they  may  well  be 
suited  for  other  applications13,  for  example,  when  a  stratum-specific  property 
is  genuinely  at  the  focus  of  one’s  research. 

Indeed,  taking  a  “principal  strata”  perscpective,  Rubin  found  the  concept 
of  mediation  “ill-defined.”  In  his  words:  “The  general  theme  here  is  that 
the  concepts  of  direct  and  indirect  causal  effects  are  generally  ill-defined  and 
often  more  deceptive  than  helpful  to  clear  statistical  thinking  in  real,  as  op¬ 
posed  to  artificial  problems”  (Rubin,  2004).  Conversely,  attempts  to  define 
and  understand  mediation  using  the  notion  of  “principal-strata  direct  effect” 
have  encountered  basic  conceptual  difficulties  (Lauritzen,  2004;  Robins  et  ah, 
2007,  2009;  Pearl,  2009b),  concluding  that  “it  is  not  always  clear  that  knowing 
about  the  presence  of  principal  stratification  effects  will  be  of  particular  use” 
(VanderWeele,  2008).  As  a  result,  it  is  becoming  widely  recognized  that  the 
controlled,  natural  and  indirect  effects  discussed  in  this  paper  are  of  greater 
interest,  both  for  the  purposes  of  making  treatment  decisions  and  for  the 
purposes  of  explanation  and  identifying  causal  mechanisms  (Joffe  et  ah,  2007; 
Albert  and  Nelson,  2011;  Mortensen  et  ah,  2009;  Imai  et  ah,  2010a, c;  Gencletti, 
2007;  Robins  et  ah,  2007,  2009;  Petersen  et  ah,  2006;  Hafeman  and  Schwartz, 
2009;  Kaufman,  2010;  Cai  et  ah,  2008). 

The  limitation  of  PSDE  stems  not  from  the  notion  of  “principal-strata” 
per  se,  which  is  merely  a  classification  of  units  into  homogeneously  reacting 
classes,  and  has  been  used  advantageously  by  many  researchers  (Balke  and 
Pearl,  1994a, b;  Pearl,  1993;  Balke  and  Pearl,  1997;  Heckerman  and  Shachter, 
1995;  Pearl,  2000,  p.  264;  Lauritzen,  2004;  Sjolander,  2009).  Rather,  the  limi¬ 
tation  results  from  strict  adherence  to  an  orthodox  philosophy  which  prohibits 
one  from  regarding  a  mediator  as  a  cause  unless  it  is  manipulable.  This  pro¬ 
hibition  prevents  one  from  defining  the  direct  effect  as  it  is  commonly  used 
in  decision  making  and  scientific  discourse  -  an  effect  transmitted  once  all 
mediating  paths  are  “deactivated”  (Pearl,  2001;  Avin  et  ah,  2005;  Albert  and 
Nelson,  2011),  and  forces  one  to  use  statistical  conditionalization  instead.  Path 
deactivation  requires  counterfactual  constructs  in  which  the  mediator  acts  as 
an  antecedent,  as  in  Eqs.  (1),  (2)  and  (6),  regardless  of  whether  it  is  physi- 

13Joffe  and  Green  (2009)  and  Pearl  and  Bareinboim  (2011)  examine  the  adequacy  of  the 
“principal-strata”  definition  of  surrogate  outcomes;  a  notion  related,  though  not  identical 
to  mediation.  There,  too,  the  restrictions  imposed  by  the  “principal-strata”  framework  lead 
to  surrogacy  criteria  that  are  incompatible  with  the  practical  aims  of  surrogacy  (see  Pearl 
(2011)). 


cally  manipulable.  After  all,  if  our  aim  is  to  uncover  causal  mechanisms,  it  is 
hard  to  accept  the  PSDE  restriction  that  nature’s  pathways  should  depend  on 
whether  we  have  the  technology  to  manipulate  one  variable  or  another.  (For 
a  comprehensive  public  discussion  of  these  issues,  including  opinions  from  en¬ 
thusiastic  and  disappointed  practitioners,  see  (Pearl,  2011;  Sjolander,  2011; 
Baker  et  ah,  2011;  Egleston,  2011). 

4  Conclusions 

Traditional  methods  of  mediation  analysis  produce  distorted  estimates  of  “me¬ 
diation  effects”  when  applied  to  nonlinear  models  or  models  with  categorical 
variables.  By  focusing  on  parameters  of  logistic  and  probit  estimators,  instead 
of  the  target  effect  measures  themselves,  traditional  methods  produce  consis¬ 
tent  estimates  of  the  former  and  biased  estimates  of  the  latter.  This  paper 
demonstrates  that  the  bias  can  be  substantial  even  in  simple  systems  with  all 
processes  correctly  parameterized,  and  only  the  outcome  dichotomized.  The 
paper  offers  a  causally  sound  alternative  that  ensures  bias-free  estimates  while 
making  no  assumption  on  the  distributional  form  of  the  underlying  process. 

We  distinguished  between  proportion  of  response  cases  for  which  media¬ 
tion  was  necessary  and  those  for  which  mediation  would  have  been  sufficient. 
Both  measures  play  a  role  in  mediation  analysis,  and  are  given  here  a  for¬ 
mal  representation  and  effective  estimation  methods  through  the  Mediation 
Formula. 

In  addition  to  providing  causally-sound  estimates  for  mediation  effects,  the 
Mediation  Formula  also  enables  researchers  to  evaluate  analytically  the  effec¬ 
tiveness  of  various  parametric  specifications  relative  to  any  assumed  model. 
For  example,  it  would  be  straightforward  to  investigate  the  distortion  created 
by  assuming  logistic  model  (as  in  (23))  when  data  is  generated  in  fact  by  a 
probit  distribution,  or  vice  versa.  This  exercise  would  amount  to  finding  the 
maximum-likelihood  (ML)  estimates  of  70,7 xy,  and  7 zy  in  (24)  for  data  gen¬ 
erated  by  a  probit  distribution  and  compare  the  estimated  effect  measures 
computed  through  (25)-(27)  with  the  true  values  of  those  measures,  as  dic¬ 
tated  by  the  probit  model.14  This  type  of  analytical  “sensitivity  analysis” 
has  been  used  extensively  in  statistics  for  parameter  estimation,  but  could 
not  be  adequately  applied  to  mediation  analysis,  owed  to  the  absence  of  an 
objective  target  quantity  that  captures  the  notion  of  indirect  effect  in  non¬ 
linear  systems.  MacKinnon  et  al.  (2007b)  for  example  evaluated  sensitivity 
to  misspecihcations  by  comparing  the  estimated  parameters  against  their  true 

14 An  alternative  would  be  to  find  the  ML  estimates  of  DE,  IE,  and  TE  directly,  through 
(25),  (26)  and  (27),  rather  than  going  through  (23)  (van  der  Laan  and  Rubin,  2006). 
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values,  though  disparities  in  parameters  may  not  represent  disparity  in  effect 
measures  (i.e.,  ED  or  IE).  By  providing  such  objective  measures  of  effects,  the 
Mediation  Formula  of  Eq.  (13)  enables  us  to  measure  directly  the  disparities 
in  the  target  quantities.15 

While  the  validity  of  the  Mediation  Formulas  rests  on  the  same  assump¬ 
tions  (i.e.,  no  unmeasured  confounders)  that  are  standard  requirement  in  linear 
mediation  analysis,  their  appeal  to  general  nonlinear  systems,  continuous  and 
categorical  variables,  and  arbitrary  complex  interactions  render  them  a  power¬ 
ful  tool  for  the  assessment  of  causal  pathways  in  many  of  the  social,  behavioral 
and  health-related  sciences. 
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